智能论文笔记

ISA-Net: Improved spatial attention network for PET-CT tumor segmentation

Zhengyong Huang , Sijuan Zou , Guoshuai Wang , Zixiang Chen , Hao Shen , Haiyan Wang , Na Zhang , Lu Zhang , Fan Yang , Haining Wangg

分类：计算机视觉

2022-11-04

Achieving accurate and automated tumor segmentation plays an important role in both clinical practice and radiomics research. Segmentation in medicine is now often performed manually by experts, which is a laborious, expensive and error-prone task. Manual annotation relies heavily on the experience and knowledge of these experts. In addition, there is much intra- and interobserver variation. Therefore, it is of great significance to develop a method that can automatically segment tumor target regions. In this paper, we propose a deep learning segmentation method based on multimodal positron emission tomography-computed tomography (PET-CT), which combines the high sensitivity of PET and the precise anatomical information of CT. We design an improved spatial attention network(ISA-Net) to increase the accuracy of PET or CT in detecting tumors, which uses multi-scale convolution operation to extract feature information and can highlight the tumor region location information and suppress the non-tumor region location information. In addition, our network uses dual-channel inputs in the coding stage and fuses them in the decoding stage, which can take advantage of the differences and complementarities between PET and CT. We validated the proposed ISA-Net method on two clinical datasets, a soft tissue sarcoma(STS) and a head and neck tumor(HECKTOR) dataset, and compared with other attention methods for tumor segmentation. The DSC score of 0.8378 on STS dataset and 0.8076 on HECKTOR dataset show that ISA-Net method achieves better segmentation performance and has better generalization. Conclusions: The method proposed in this paper is based on multi-modal medical image tumor segmentation, which can effectively utilize the difference and complementarity of different modes. The method can also be applied to other multi-modal data or single-modal data by proper adjustment.

translated by 谷歌翻译

EMC2A-Net: An Efficient Multibranch Cross-channel Attention Network for SAR Target Classification

Xiang Yu , Zhe Geng , Xiaohua Huang , Qinglu Wang , Daiyin Zhu

分类：计算机视觉

2022-08-03

近年来，卷积神经网络（CNN）在合成孔径雷达（SAR）目标识别方面表现出巨大的潜力。 SAR图像具有强烈的粒度感，并且具有不同的纹理特征，例如斑点噪声，目标优势散射器和目标轮廓，这些轮廓很少在传统的CNN模型中被考虑。本文提出了两个残留块，即具有多尺度接收场（RFS）的EMC2A块，基于多型结构，然后设计了有效的同位素体系结构深CNN（DCNN），EMC2A-net。 EMC2A阻止使用不同的扩张速率利用平行的扩张卷积，这可以有效地捕获多尺度上下文特征而不会显着增加计算负担。为了进一步提高多尺度功能融合的效率，本文提出了多尺度特征跨通道注意模块，即EMC2A模块，采用了局部的多尺度特征交互策略，而无需降低维度。该策略通过有效的一维（1D） - 圆形卷积和Sigmoid函数适应每个通道的权重，以指导全球通道明智的关注。 MSTAR数据集上的比较结果表明，EMC2A-NET优于相同类型的现有模型，并且具有相对轻巧的网络结构。消融实验结果表明，仅使用一些参数和适当的跨渠道相互作用，EMC2A模块可显着提高模型的性能。

translated by 谷歌翻译

A new database of Houma Alliance Book ancient handwritten characters and its baseline algorithm

Xiaoyu Yuan , Zhibo Zhang , Yabo Sun , Zekai Xue , Xiuyan Shao , Xiaohua Huang

分类：计算机视觉 | 人工智能

2022-07-13

侯马联盟书是中国山西博物馆小镇博物馆的国家宝藏之一。它在研究古老的历史方面具有重要的历史意义。迄今为止，关于霍玛联盟书籍的研究一直留在纸质文件的识别中，这是无法识别和难以显示，学习和宣传的纸质文件。因此，霍玛联盟公认的古代角色的数字化可以有效提高识别古代角色并提供更可靠的技术支持和文本数据的效率。本文提出了一个新的Houma Alliance书籍的新数据库。在数据库中，从原始书籍收藏和人类的模仿写作中收集了297个班级和3,547个Houma Alliance古代手写字符样本。此外，决策级分类器融合策略用于融合三个众所周知的深神网络体系结构，以供古代手写角色识别。实验是在我们的新数据库上执行的。实验结果首先为研究界提供了新数据库的基线结果，然后证明了我们提出的方法的效率。

translated by 谷歌翻译

Comprehensive and Clinically Accurate Head and Neck Organs at Risk Delineation via Stratified Deep Learning: A Large-scale Multi-Institutional Study

Dazhou Guo , Jia Ge , Xianghua Ye , Senxiang Yan , Yi Xin , Yuchen Song , Bing-shen Huang , Tsung-Min Hung , Zhuotun Zhu , Ling Peng

分类：计算机视觉

2021-11-01

风险的准确器官（OAR）分割对于减少治疗后并发症的放射治疗至关重要。达人指南推荐头部和颈部（H＆N）区域的一套超过40桨的桨，然而，由于这项任务的可预测的禁止劳动力成本，大多数机构通过划定较小的桨子和忽视的少数，选择了大量简化的协议与其他桨相关的剂量分布。在这项工作中，我们提出了一种使用深度学习的新颖，自动化和高效的分层OAR分段（SOARS）系统，精确地描绘了一套全面的42 H＆N OAR。 SOARS将42桨分层进入锚，中级和小型和硬质子类别，通过神经结构搜索（NAS）原则，专门为每个类别提供神经网络架构。我们在内在机构中使用176名培训患者建立了SOAR模型，并在六个不同的机构中独立评估了1327名外部患者。对于每个机构评估，它始终如一地表现出其他最先进的方法至少3-5％的骰子得分（在其他度量的相对误差减少36％）。更重要的是，广泛的多用户研究明显证明，98％的SOARE预测只需要非常轻微或没有直接临床验收的修订（节省90％的辐射脑神经工作负载），并且它们的分割和剂量准确度在于或小于帧 - 用户的变化。这些调查结果证实了H＆N癌症放射疗法工作流OAR描绘过程的强烈临床适用性，提高了效率，全面性和质量。

translated by 谷歌翻译

Deep reinforcement learning for portfolio management

Gang Huang , Xiaohua Zhou , Qingyang Song

分类：机器学习

2020-12-26

在我们的论文中，我们应用了深度加强学习方法，以优化投资组合管理中的投资决策。我们做出了几种创新，例如添加短机制并设计套利机制，并应用我们的模型来为几个随机选择的投资组合进行决策优化。实验结果表明，我们的模型能够优化投资决策，并有能力获得股票市场的超额回报，优化的代理在整个交易期间以固定价值维持资产权重，并以非常低的交易成本率交易。此外，我们还重新设计了用于计算持续交易过程中的投资组合资产权重的公式，这可以使杠杆交易填补了在短路时计算了组合重量的理论差距。

translated by 谷歌翻译

Galaxy Image Classification using Hierarchical Data Learning with Weighted Sampling and Label Smoothing

Xiaohua Ma , Xiangru Li , Ali Luo , Jinqu Zhang , Hui Li

分类：机器学习

2022-12-20

With the development of a series of Galaxy sky surveys in recent years, the observations increased rapidly, which makes the research of machine learning methods for galaxy image recognition a hot topic. Available automatic galaxy image recognition researches are plagued by the large differences in similarity between categories, the imbalance of data between different classes, and the discrepancy between the discrete representation of Galaxy classes and the essentially gradual changes from one morphological class to the adjacent class (DDRGC). These limitations have motivated several astronomers and machine learning experts to design projects with improved galaxy image recognition capabilities. Therefore, this paper proposes a novel learning method, ``Hierarchical Imbalanced data learning with Weighted sampling and Label smoothing" (HIWL). The HIWL consists of three key techniques respectively dealing with the above-mentioned three problems: (1) Designed a hierarchical galaxy classification model based on an efficient backbone network; (2) Utilized a weighted sampling scheme to deal with the imbalance problem; (3) Adopted a label smoothing technique to alleviate the DDRGC problem. We applied this method to galaxy photometric images from the Galaxy Zoo-The Galaxy Challenge, exploring the recognition of completely round smooth, in between smooth, cigar-shaped, edge-on and spiral. The overall classification accuracy is 96.32\%, and some superiorities of the HIWL are shown based on recall, precision, and F1-Score in comparing with some related works. In addition, we also explored the visualization of the galaxy image features and model attention to understand the foundations of the proposed scheme.

translated by 谷歌翻译

FlexiViT: One Model for All Patch Sizes

Lucas Beyer , Pavel Izmailov , Alexander Kolesnikov , Mathilde Caron , Simon Kornblith , Xiaohua Zhai , Matthias Minderer , Michael Tschannen , Ibrahim Alabdulmohsin , Filip Pavetic

分类：计算机视觉 | 人工智能 | 机器学习

2022-12-15

Vision Transformers convert images to sequences by slicing them into patches. The size of these patches controls a speed/accuracy tradeoff, with smaller patches leading to higher accuracy at greater computational cost, but changing the patch size typically requires retraining the model. In this paper, we demonstrate that simply randomizing the patch size at training time leads to a single set of weights that performs well across a wide range of patch sizes, making it possible to tailor the model to different compute budgets at deployment time. We extensively evaluate the resulting model, which we call FlexiViT, on a wide range of tasks, including classification, image-text retrieval, open-world detection, panoptic segmentation, and semantic segmentation, concluding that it usually matches, and sometimes outperforms, standard ViT models trained at a single patch size in an otherwise identical setup. Hence, FlexiViT training is a simple drop-in improvement for ViT that makes it easy to add compute-adaptive capabilities to most models relying on a ViT backbone architecture. Code and pre-trained models are available at https://github.com/google-research/big_vision

translated by 谷歌翻译

PaLI: A Jointly-Scaled Multilingual Language-Image Model

Xi Chen , Xiao Wang , Soravit Changpinyo , AJ Piergiovanni , Piotr Padlewski , Daniel Salz , Sebastian Goodman , Adam Grycner , Basil Mustafa , Lucas Beyer

分类：计算机视觉 | 自然语言处理

2022-09-14

有效的缩放和灵活的任务接口使大型语言模型能够在许多任务中表现出色。帕利（Pali）根据视觉和文本输入生成文本，并使用该界面以许多语言执行许多视觉，语言和多模式任务。为了训练帕利，我们利用了大型的编码器语言模型和视觉变压器（VITS）。这使我们能够利用其现有能力，并利用培训它们的大量成本。我们发现，视觉和语言组成部分的联合缩放很重要。由于现有的语言变压器比其视觉对应物要大得多，因此我们训练迄今为止最大的VIT（VIT-E），以量化甚至大容量视觉模型的好处。为了训练Pali，我们基于一个新的图像文本训练集，其中包含10B图像和文本，以100多种语言来创建大型的多语言组合。帕利（Pali）在多个视觉和语言任务（例如字幕，视觉问题，索方式，场景文本理解）中实现了最新的，同时保留了简单，模块化和可扩展的设计。

translated by 谷歌翻译

Revisiting Neural Scaling Laws in Language and Vision

Ibrahim Alabdulmohsin , Behnam Neyshabur , Xiaohua Zhai

分类：机器学习 | 人工智能

2022-09-13

近年来，深度学习的显着进步主要是由于规模的改进而驱动，在该规模上，更大的模型在较大的数据集上进行了更长的时间表的培训。为了从经验上预测规模的好处，我们主张基于外推损失的更严格的方法，而不是报告最合适的（插值）参数。然后，我们提出了一种从学习曲线可靠地估算缩放定律参数的配方。我们证明，除了来自大型基础评估基准的任务外，除了大型域中，包括图像分类，神经机器翻译（NMT）和语言建模，包括图像分类，神经机器翻译（NMT）和语言建模，它比以前的方法更准确地推断出更准确的方法。最后，我们发布了一个由90个评估任务组成的基准数据集，以促进该领域的研究。

translated by 谷歌翻译

Appearance-guided Attentive Self-Paced Learning for Unsupervised Salient Object Detection

Huajun Zhou , Bo Qiao , Lingxiao Yang , Jianhuang Lai , Xiaohua Xie

分类：计算机视觉

2022-07-13

现有的基于深度学习（基于DL的）无监督的显着对象检测（USOD）方法基于传统显着性方法和预处理深网的先验知识，在图像中学习显着信息。但是，这些方法采用了一种简单的学习策略来训练深层网络，因此无法将培训样本的“隐藏”信息正确地纳入学习过程。此外，对于分割对象至关重要的外观信息仅在网络训练过程后用作后处理。为了解决这两个问题，我们提出了一个新颖的外观引导的细心自进度学习框架，以无视显着对象检测。提出的框架将自定进度的学习（SPL）和外观指导集成到统一的学习框架中。具体而言，对于第一期，我们提出了一个细心的自进度学习（ASPL）范式，该范式以有意义的命令组织培训样本，以逐步挖掘更详细的显着性信息。我们的ASPL促进了我们的框架，能够自动产生软关注权重，以纯粹的自学方式衡量训练样本的学习难度。对于第二期，我们提出了一个外观指南模块（AGM），该模块将每个像素作为显着性边界的概率的局部外观对比，并通过最大化概率找到目标对象的潜在边界。此外，我们通过汇总其他模态数据的外观向量，例如深度图，热图像或光流，将框架进一步扩展到其他多模式SOD任务。关于RGB，RGB-D，RGB-T和视频SOD基准的广泛实验证明，我们的框架可以针对现有的USOD方法实现最新性能，并且与最新的监督SOD方法相当。

translated by 谷歌翻译